The Use of Transition Entropy in Partially Observable Markov Decision Processes1
نویسندگان
چکیده
In this report we describe a new POMDP algorithm, denoted TEQ-MDP, which computes the optimal policy of a modified MDP and uses the obtained optimal solution to compute the action for the POMDP, as a function of the belief-state. The modified MDP includes state entropy information (transition entropy) in its reward structure so as to value actions that gather information. This algorithm is suitable for real-time implementation, since the main computational burden can be done off-line. We present the results from several tests in example-environments from the literature and compare the performance of the TEQ-MDP algorithm with the performance of another heuristic method (Q-MDP). In the various situations where Q-MDP presents near-optimal behaviour, the TEQ-MDP algorithm performs no worse. Furthermore, TEQ-MDP also presents near-optimal performance in particular cases where the Q-MDP algorithm clearly fails.
منابع مشابه
Transition Entropy in Partially Observable Markov Decision Processes
This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance...
متن کاملA POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملEstimation of Channel State Transition Probabilities Based on Markov Chains in Cognitive Radio
—Prediction of spectrum sensing and access is one of the keys in cognitive radio (CR). It is necessary to know the channel state transition probabilities to predict the spectrum. By the use of the model of partially observable Markov decision process (POMDP), this paper addressed the spectrum sensing and access in cognitive radio and proposed an estimation algorithm of channel state transition...
متن کاملA POMDP Model for Guiding Taxi Cruising in a Congested Urban City
We consider a partially observable Markov decision process (POMDP) model for improving a taxi agent cruising decision in a congested urban city. Using real-world data provided by a large taxi company in Singapore as a guide, we derive the state transition function of the POMDP. Specifically, we model the cruising behavior of the drivers as continuous-time Markov chains. We then apply dynamic pr...
متن کاملThe Duality of State and Observation in Probabilistic Transition Systems
In this paper we consider the problem of representing and reasoning about systems, especially probabilistic systems, with hidden state. We consider transition systems where the state is not completely visible to an outside observer. Instead, there are observables that partly identify the state. We show that one can interchange the notions of state and observation and obtain what we call a dual ...
متن کامل